EN FR
EN FR


Section: New Results

Mitigating the Cost of Identifiers in Sequence CRDT

Participants : Matthieu Nicolas, Gérald Oster, Olivier Perrin.

To achieve high availability, large-scale distributed systems have to replicate data and to minimise coordination between nodes. The literature and industry increasingly adopt Conflict-free Replicated Data Types (CRDTs) to design such systems. CRDTs are data types which behave as traditional ones, e.g. the Set or the Sequence. However, compared to traditional data types, they are designed to support natively concurrent modifications. To this end, they embed in their specification a conflict-resolution mechanism.

To resolve conflicts in a deterministic manner, CRDTs usually attach identifiers to elements stored in the data structure. Identifiers have to comply with several constraints such as uniqueness or being densely ordered according to the kind of CRDT. These constraints may prevent the identifiers’ size from being bounded. As the number of the updates increases, the size of identifiers grows. This leads to performance issues, since the efficiency of the replicated data structure decreases over time.

To address this issue, we propose a new CRDT for Sequence which embeds a renaming mechanism. It enables nodes to reassign shorter identifiers to elements in an uncoordinated manner. Obtained experiment results demonstrate that this mechanism decreases the overhead of the replicated data structure and eventually limits it.

To validate the proposed renaming mechanism, we performed an experimental evaluation to measure its performances on several aspects: (i) the size of the data structure ; (ii) the integration time of the rename operation ; (iii) the integration time of insert and remove operations. In cases (i) and (iii), we use LogootSplit as the baseline data structure to compare results. The results we obtained are very encouraging, as the integration time is far shorter with the renaming mechanism, even with the time spent to apply the rename operation.